Processing Large Data Sets using a Cluster Computing Framework
نویسندگان
چکیده
Increase in the scientific disciplines has caused large data collections as important community resources. The volume of interesting data is already measured in terabytes and will soon total in peta-bytes. This research proposal presents the issue of processing massive amount of satellite data. A single LEO satellite sends around 2 GB of data in 24 hours of a day. To process this huge amount of data, normal digital computers face constraints like processing time, recourses and cost. A solution is needed that can provide quick way of processing at low cost. Cluster computing is network based distributed environment that can be a solution for fast processing support for huge sized jobs. A middle-ware is typically required in cluster computing. In this proposal a middle-ware is proposed for handling the existing processing problems in distributed environments. In a typical heterogeneous computation, a middleware can be employed to provide incorporation and interoperability in the underlying applications and services.
منابع مشابه
An Efficient Resource Allocation for Processing Healthcare Data in the Cloud Computing Environment
Nowadays, processing large-media healthcare data in the cloud has become an effective way of satisfying the medical userschr('39') QoS (quality of service) demands. Providing healthcare for the community is a complex activity that relies heavily on information processing. Such processing can be very costly for organizations. However, processing healthcare data in cloud has become an effective s...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملScalable Data Clustering using GPU Clusters
The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA’s CUDA framework and Tesla arch...
متن کامل2016 Olympic Games on Twitter: Sentiment Analysis of Sports Fans Tweets using Big Data Framework
Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for ...
متن کاملA Distributed GPU-based Framework for real-time 3D Volume Rendering of Large Astronomical Data Cubes
We present a framework to interactively volume-render three-dimensional data cubes using distributed ray-casting and volume bricking over a cluster of workstations powered by one or more graphics processing units (GPUs) and a multi-core CPU. The main design target for this framework is to provide an in-core visualization solution able to provide three-dimensional interactive views of terabyte-s...
متن کامل